import numpy as np
import plotly.graph_objs as go
import plotly.io as pio
pio.renderers.default = "notebook" # set the default plotly renderer to "notebook" (necessary for quarto to render the plots)12 Time Series Clustering
This section is still under construction and will be completed in the near future. Please do not go beyond this point for now.
The goal of clustering is to group similar time series together based on their characteristics or patterns.
But what does similar mean in the context of time series data? Defining similarity is not straight forward due to the inherent subjectivity involved.
Difficulties are: - Context-Dependent: Similarity can vary based on the specific application or domain. - Human Bias: Metrics are often chosen based on human intuition, which can introduce bias. - Inconsistent Interpretations: Different experts may have different perspectives on what constitutes similarity. - Parameter Sensitivity: Many similarity measures come with tunable parameters that can significantly affect the results.
12.1 Clustering algorithms
Time series clustering algorithms can be broadly categorized into four main types: - Distance-Based: These algorithms rely on a distance measure to quantify the similarity between time series. Examples include k-means and k-median. - Distribution-Based: These methods model the distribution (in a very general sense) of the time series data and cluster based on statistical properties or descriptive features that represent the characteristics of the time series. Examples include Gaussian Mixture Models (GMM) and DBSCAN. - Subsequence-Based: These algorithms focus on clustering representative subsequences of time series data rather than the entire series. Examples include methods based on shapelets and sliding windows. - Representation-Learning-Based: These methods use techniques like autoencoders or recurrent neural networks to learn a lower-dimensional representation of the time series data, which is then clustered using traditional clustering algorithms.
Paparrizos, Yang, and Li (2024) gives a good overview of different time series clustering methods.
In the following, we will briefly introduce distance-based clustering using two common similarity measures: Euclidean distance and Dynamic Time Warping (DTW).
12.2 Similarity Measures
nr_points = 50
interval_start = 0
interval_end = 20
t = np.linspace(interval_start, interval_end, nr_points)
# Generate two similar shaped time series
series1 = 0.8 + 0.9 * np.sin(0.9 + 0.9 * t) + 0.03 * np.random.randn(nr_points) - 0.2 * (t / (interval_end - interval_start))
series2 = np.sin(1.05 * t) + 0.15 * np.random.randn(nr_points)def plot_time_series_plotly(t, series1, series2, title):
fig = go.Figure()
fig.add_trace(go.Scatter(x=t, y=series1, mode='lines+markers', name='Series 1'))
fig.add_trace(go.Scatter(x=t, y=series2, mode='lines+markers', name='Series 2'))
fig.update_layout(
title=title,
xaxis_title='t',
yaxis_title='Value',
legend_title='Series'
)
return fig12.2.1 Euclidean Distance
The Euclidean distance is the most straightforward similarity measure. It calculates the straight-line distance between two points. For time series data, this means comparing the values at each time point directly.
Having two time series \(X = (x_1, x_2, \ldots, x_n)\) and \(Y = (y_1, y_2, \ldots, y_n)\) of equal length \(n\), the Euclidean distance \(d\) between them is calculated as: \[d(X, Y) = \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2}\]
Code
def plot_euclidean_distance(series1, series2, t):
fig = go.Figure()
fig.add_trace(go.Scatter(x=t, y=series1, mode='lines+markers', name='Series 1'))
fig.add_trace(go.Scatter(x=t, y=series2, mode='lines+markers', name='Series 2'))
# Draw lines showing the Euclidean distance at each point (every 5th for clarity)
for i, v in enumerate(t):
fig.add_trace(go.Scatter(
x=[v, v],
y=[series1[i], series2[i]],
mode='lines',
line=dict(color='gray', dash='dash', width=1),
showlegend=False
))
fig.update_layout(
title='Euclidean Distance Visualization Between Series 1 and Series 2',
xaxis_title='t',
yaxis_title='Value'
)
return fig12.2.2 Dynamic Time Warping
Dynamic Time Warping (DTW) is a more advanced similarity measure that accounts for shifts and distortions in the time axis. It finds the optimal alignment between two time series by warping the time dimension, allowing for comparisons even when the series are out of phase or have different lengths.
from scipy.spatial.distance import cdist
def dtw_distance(s1, s2):
n, m = len(s1), len(s2)
# initializing cost matrix
cost = np.full((n + 1, m + 1), np.inf)
cost[0, 0] = 0
for i in range(1, n + 1):
for j in range(1, m + 1):
dist = abs(s1[i - 1] - s2[j - 1])
cost[i, j] = dist + min(cost[i - 1, j], # insertion
cost[i, j - 1], # deletion
cost[i - 1, j - 1]) # match
# backtracking path (just for visualization)
path = []
i, j = n, m
while i > 0 and j > 0:
path.append((i - 1, j - 1))
steps = [(i - 1, j), (i, j - 1), (i - 1, j - 1)]
costs = [cost[s] if s[0] >= 0 and s[1] >= 0 else np.inf for s in steps]
min_step = steps[np.argmin(costs)]
i, j = min_step
path = path[::-1]
return cost[n, m], path
d_dtw, path = dtw_distance(series1, series2)
print(f'Dynamic Time Warping Distance: {d_dtw:.2f}')Dynamic Time Warping Distance: 21.82
Note that this DTW implementation is using a very naive approach computing the full cost matrix. Many different variants exist that are more efficient and/or add constraints to the warping path. Lahreche and Boucheham (2021) provides a good overview.
def plot_dtw_alignment(s1, s2, t, path):
fig = go.Figure()
fig.add_trace(go.Scatter(x=t, y=s1, mode='lines+markers', name='Series 1'))
fig.add_trace(go.Scatter(x=t, y=s2, mode='lines+markers', name='Series 2'))
# Draw alignment lines
for (i, j) in path:
fig.add_trace(go.Scatter(
x=[t[i], t[j]],
y=[s1[i], s2[j]],
mode='lines',
line=dict(color='gray', width=1, dash='dot'),
showlegend=False
))
fig.update_layout(
title='Dynamic Time Warping Alignment Between Series 1 and Series 2',
xaxis_title='t',
yaxis_title='Value'
)
return fig
plot_dtw_alignment(series1, series2, t, path)12.2.3 Preprocessing
In the previous plots we observe that noise, outliers and positioning of the time series can have a significant impact on the similarity measures. To mitigate these effects, we can apply various preprocessing techniques such as: - Smoothing: Applying filters (e.g., moving average, Gaussian) to reducce noise. - Normalization: Scaling the time series to a common range to ensure that differences in amplitude do not dominate the similarity measure. - Detrending: Removing trends to focus on the fluctuations around a mean level.
Note that the applied preprocessing techniques should be chosen based on the specific characteristics of the data and the analysis goals.
Here, we first smooth the time series using a simple moving average filter. This can easily be realized by convolution with a specific kernel (a vector consisting of equal weights summing to 1, where the length of the vector is equal to the desired window size).
Euclidean Distance: 2.26
12.3 k-Means Clustering
Just like the \(k\)-means algorithm for tabular data, the \(k\)-means algorithm for time series data aims to partition a set of time series into \(k\) clusters, where each time series belongs to the cluster with the nearest mean (centroid) time series.
The algorithm works as follows:
Initialization: Randomly select \(k\) initial centroids from the time series dataset.
Assignment Step: Assign each time series to the cluster whose centroid is closest, based on a chosen similarity measure (e.g., Euclidean distance, Dynamic Time Warping).
Update Step: Recalculate the centroids of the clusters by taking the mean of all time series assigned to each cluster.
Repeat: Repeat the assignment and update steps until convergence (i.e., when assignments no longer change or a maximum number of iterations is reached).
In practice, calculating the centroid of a cluster needs to be compatible with the chosen similarity measure (at least it is very beneficial if it is compatible). Especially for non-Euclidean measures with time series of variable length like Dynamic Time Warping, specialized methods may be used to compute the centroid.
Here we will have a look at an example using the sktime library. The dataset we will use is a subset of the trace dataset, which is a synthetic dataset introduced by Roverso (2002) for the purpose of plant diagnostics. The subset includes only 4 classes of univariate time series.
import numpy as np
import plotly.graph_objs as go
import plotly.io as pio
from sklearn.preprocessing import StandardScaler
from sktime.clustering.k_means import TimeSeriesKMeansTslearn
from tslearn.datasets import CachedDatasets
pio.renderers.default = "notebook" # set the default plotly renderer to "notebook" (necessary for quarto to render the plots)We have 200 time series in total, each of length 275.
Note that we also have class labels, which is not the case in real clustering problems. We will solely use them in the end to evaluate the clustering performance.
Without class labels it is hard to count the number of classes in the data, but we can see that there are some patterns in the data.
Since normalization and scaling is important for distance-based methods, we will use the StandardScaler from sklearn to standardize the data to have zero mean and unit variance.
Let us also just visualize a few time series from the dataset to get a better idea of the data.
COLORS = ['#A6CEE3', '#B2DF8A', '#FDBF6F', '#CAB2D6'] # pastel, colorblind-friendly
sampled_ids = [0, 10, 25, 81]
fig = go.Figure()
for idx, i in enumerate(sampled_ids):
fig.add_trace(go.Scatter(
y=X[i, :, 0],
mode='lines',
line=dict(width=3, color=COLORS[idx]),
name=f"Sample {i}"
))
fig.update_layout(title='Sampled Time Series from X', xaxis_title='Time', yaxis_title='Value', height=400)
fig.show()12.3.1 Euclidean Distance Example
In this example we know that there are 4 classes in the data, so we will set \(k=4\).
Luckily, tslearn already implements a variety of clustering algorithms that we can use out of the box, including the \(k\)-means algorithm.
Helper function for plotting clusters and cluster centers
def plot_clusters(X, y, title):
fig = go.Figure()
for cluster_idx, cluster in enumerate(sorted(set(y))):
idx = np.where(y == cluster)[0]
show_legend = True # Only show legend for the first trace of each cluster
for i in idx:
fig.add_trace(go.Scatter(
y=X[i, :, 0],
mode='lines',
line=dict(width=1, color=COLORS[cluster_idx]),
opacity=0.5,
name=f'Cluster {cluster_idx + 1}' if show_legend else None,
legendgroup=cluster_idx,
showlegend=show_legend
))
show_legend = False
fig.update_layout(
title=title,
xaxis_title='Time',
yaxis_title='Value',
height=500
)
fig.show()
def plot_centroids(clusterer):
fig = go.Figure()
for cluster_idx, centroid in enumerate(clusterer.cluster_centers_):
fig.add_trace(go.Scatter(
y=centroid[:, 0],
mode='lines',
line=dict(width=3, color=COLORS[cluster_idx]),
name=f'Centroid {cluster_idx + 1}'
))
fig.update_layout(
title='Cluster Centroids',
xaxis_title='Time',
yaxis_title='Value',
height=400
)
fig.show()Next, we plot the \(k\)-means clusters. Remember that the legend is clickable.
In comparison, here are the true classes according to the labels in the dataset.
The four classes from the original dataset can be described as follows: - Class A (Cluster 1 in Figure): Time series that start high, rise to a huge peak around the middle, fall back to low and then gradually rise to high again. - Class B (Cluster 2 in Figure): Time series that start high, drop to a low point around the middle, and then rise back up. - Class C (Cluster 3 in Figure): Time series that start low, quickly rise to high around the middle, and then show some oscillations on high plateau - Class D (Cluster 4 in Figure): Similiar to Class C, but without oscillations.
The clusters found by the \(k\)-means algorithm do not correspond well to these classes: - Class A and B are mixed up in Cluster 2. - Class C and D are mixed up in Cluster 1, 3, and 4.
This is due to the fact that the \(k\)-means algorithm is based on minimizing the within-cluster distances based on the euclidean distance, which does not necessarily correspond to the true classes in the data.
This also reflects when we visualize the cluster centers.
Next, let us check how the clustering performs with dynamic time warping as distance measure.
12.3.2 Dynamic Time Warping
While euclidean distance also uses the mean for computing the centroid, dynamic time warping uses a more complex method called soft-DTW barycenter averaging, which is compatible with the DTW distance measure.
Soft-DTW, a differentiable version of DTW, enables the use of gradient-based optimization techniques for computing the barycenter. Its differentiable nature also facilitates its integration as loss function into machine learning algorithms, in particular neural networks.
More information on soft-DTW can be found in the related paper Cuturi and Blondel (2017).
Have a look at the execution times of both algorithms. While DTW is executed in parallel (n_jobs=-1), it is still significantly slower than the euclidean distance version. The reason for this is that the DTW algorithm got a time complexity of \(O(NM)\), where \(N\) and \(M\) are the lengths of the two time series to be compared. More sophisticated algorithms are able to slightly reduce this time complexity, but so far, the runtime complexity stays quadratic.
However, the clustering results look better now.
Cluster 2 matches class A very well, cluster 4 matches class B. However, cluster 1 and 3 still contain a mix of class C and D.
12.4 Elbow method for determining the number of clusters
One open question is how to determine the number of clusters \(k\). A common method is the elbow method, which plots the sum of distances to the nearest cluster center for different values of \(k\). The idea is to choose the value of \(k\) at which the rate of decrease sharply shifts, forming an elbow shape in the plot.
import plotly.graph_objs as go
fig = go.Figure()
fig.add_trace(go.Scatter(
x=list(K),
y=sum_of_distances,
mode='lines+markers',
marker=dict(color='blue'),
line=dict(color='blue'),
name='Sum of distances'
))
fig.update_layout(
title='Elbow Method For Optimal k',
xaxis_title='k',
yaxis_title='Sum of distances'
)
fig.show()The elbow method suggests that four clusters is a good choice for \(k\), which matches the true number of classes in the data.
Often it makes sense to start with a larger number of clusters and then merge similar clusters later on. This allows to capture more subtle patterns in the data.
12.5 Additional readings
tslearn dtw documentation (accessed: 29 09 2025) briefly introduces dynamic time warping, barycenters and soft-DTW. To learn more about different methods used to calculate barycenters, have a look at the tslearn barycenter documentation (accessed: 29 09 2025).
To learn more about dynamic time warping, the wikipedia article on dynamic time warping (accessed: 29 09 2025) is a good starting point.
\(k\)-means clustering in time series is still a very active research area. A recent preprint by Holder, Bagnall, and Lines (2024) gives a nice overview over the variants of \(k\)-means for time series and discusses their pros and cons.